Semi-Supervised Model Training for Unbounded Conversational Speech Recognition
نویسندگان
چکیده
For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech. These corpora are also timeworn due to dated acoustic telephony features and the rapid advancement of colloquial vocabulary and idiomatic speech over the last decades. Utilizing the colossal scale of our unlabeled telephony dataset, we propose a technique to construct a modern, high quality conversational speech training corpus on the order of hundreds of millions of utterances (or tens of thousands of hours) for both acoustic and language model training. We describe the data collection, selection and training, evaluating the results of our updated speech recognition system on a test corpus of 7K manually transcribed utterances. We show relative word error rate (WER) reductions of {35%, 19%} on {agent, caller} utterances over our seed model and 5% absolute WER improvements over IBM Watson STT on this conversational speech task.
منابع مشابه
Semi - Supervised Learning for Acoustic
Enormous amounts of audio recordings of human speech are essential ingredients for building reliable statistical models for many speech applications, such as automatic speech recognizers and automatic prosody detector. However, most of these speech data are not being utilized because they lack transcriptions. The goal of this thesis is to use untranscribed (unlabeled) data to improve the perfor...
متن کاملDiscriminative optimization of large vocabulary Mandarin conversational speech recognition system
This paper examines techniques of discriminative optimization for acoustic model, including both HMM parameters and linear transforms, in the context of HUB5 Mandarin large vocabulary speech recognition task, with the aim to partly solve the problems brought by the sparseness and the highly ambiguous nature of the telephony conversational speech data. Three techniques are studied: MMI training ...
متن کاملSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Arabic speech recognition suffers from the scarcity of properly labeled data. In this project, we introduce a pipeline that performs semi-supervised segmentation of audio then— after hand-labeling a small dataset—feeds labeled segments to a supervised learning framework to select, through many rounds of hyperparameter optimization, an ensemble of models to infer labels for a larger dataset; usi...
متن کاملSemi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data
This work addresses one of the common issues arising when building a speech recognition system within a low-resourced scenario adapting the language model on unlabeled audio data. The proposed methodology makes use of such data by means of semisupervised learning. Whilst it has been proven that adding system-generated labeled data for acoustic modeling yields good results, the benefits of addin...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.09724 شماره
صفحات -
تاریخ انتشار 2017